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can be projected around the user so that that the person's actions 
are displayed around them. The video camera and projector 
operate on different wavelengths so that they do not interfere 
with each other. Uses for such a device include, but are not 
limited to, interactive lighting effects for people at clubs or 
events, interactive advertising displays, etc. Computer-generated 
characters and virtual objects can be made to react to the 
movements of passers-by, generate interactive ambient lighting 
for social spaces such as restaurants, lobbies and parks, video 
game systems and create interactive information spaces and art 
installations. Patterned illumination and brightness and gradient 
processing can be used to improve the ability to detect an object 
against a background of video images. 
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INTERACTIVE VIDEO DISPLAY SYSTEM 

CROSS-REFERENCES TO RELATED APPLICATIONS 
[01] This application claims priority from co-pending U.S. Provisional Patent Application 
No. 60/296,189 filed June 5, 2001 entitled INTERACTIVE VIDEO DISPLAY SYSTEM 
THAT USES VIDEO INPUT which is hereby incorporated by reference, as if set forth in foil 
in this document, for all purposes. 



BACKGROUND OF THE INVENTION 
[02] The present invention relates in general to image processing systems and more 
specifically to a system for receiving and processing an image of a human user to allow 
interaction with video displays. 

[03] Image processing is used in many areas of analysis, education, commerce and 
entertainment. One aspect of image processing includes human-computer interaction by 
detecting human forms and movements to allow interaction with images. Applications of 
such processing can use efficient or entertaining ways of interacting with images to define 
digital shapes or other data, animate objects, create expressive forms, etc. 
104] Detecting the position and movement of a human body is referred to as "motion 
capture." With motion capture techniques, mathematical descriptions of a human 
performer's movements are input to a computer or other processing system. Natural body 
movements can be used as inputs to the computer to study athletic movement, capture data 
for later playback or simulation, enhance analysis for medical purposes, etc. 
[05] Although motion capture provides benefits and advantages, motion capture techniques 
tend to be complex. Some techniques require the human actor to wear special suits with 
high-visibility points at several locations. Other approaches use radio-frequency or other 
types of emitters, multiple sensors and detectors, blue-screens, extensive post-processing, etc. 
Techniques that rely on simple visible-light image capture are usually not accurate enough to 
provide well-defined and precise motion capture. 

[061 Some motion capture applications allow an actor, or user, to interact with images that 
are created and displayed by a computer system. For example, an actor may stand in front of 
a large video screen projection of several objects. The actor can move, or otherwise generate, 
modify, and manipulate, the objects by using body movements. Different effects based on 
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an actor's movements can be computed by the processing system and displayed on the 
display screen. For example, the computer system can track a path of the actor in front of the 
display screen and render an approximation, or artistic interpretation, of the path onto the 
display screen. The images with which the actor interacts can be e.g., on the floor, wall or 
other surface; suspended three-dimensionally in space, displayed on one or more monitors, 
projection screens or other devices. Any type of display device or technology can be used to 
present images with which a user can control or interact. 

[07] In some applications, such as point-of-sale, retail advertising, promotions, arcade 
entertainment sites, etc., it is desirable to capture the motion of an untrained user (e.g., a 
person passing by) in a very unobtrusive way. Ideally, the user will not need special 
preparation or training and the system will not use unduly expensive equipment. Also, the 
method and system used to motion capture the actor should, preferably, be invisible or 
undetectable to the user. Many real-world applications must work in environments where 
there are complex and changing background and foreground objects, short time intervals for 
the capture, changing lighting conditions and other factors that can make motion capture 
difficult. 



BRIEF SUMMARY OF THE INVENTION 
[081 The present invention permits interaction between a user and a computer display 
system using the user's (or another object's) movement and position as input to a computer. 
The computer generates a display that responds to the user's position and movement. The 
generated display can include objects or shapes that can be moved, modified, or otherwise 
controlled by a user's body movements. 

[09] In a preferred embodiment of the invention, displayed images are affected by a user's 
actions in real-time. The display can be projected around the user so that that the user's 
actions create effects that emanate from the user and affect display areas proximate to the 
user. Or the user can affect video objects such as by kicking, pushing, moving, deforming, 
touching, etc., items in video images. Interference between light used to display interactive 
images and light used to detect the user is minimized by using light of substantially different 
wavelengths. 

[10] In one embodiment, a user is illuminated with infrared light that is not visible to the 
human eye. A camera that is sensitive to infrared light is used to capture an image of the 
user for position and motion analysis. Visible light is projected by a projector onto a screen, 
glass or other surface to display interactive images, objects, patterns or other shapes and 
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effects. The display surface can be aligned around the user so that their physical presence 
within the display corresponds to their virtual presence, giving the experience of physically 
touching and interacting with virtual objects. 
v [111 One aspect of the invention can use patterned illumination instead of a simple, non- 
visible, uniform "floodlight." With patterned illumination, a pattern, such as a checkerboard, 
random dot pattern, etc., is projected. The pattern is used by processes executing on a 
computer to interpret a camera image and to detect an object from a background and/or other 
items in a scene. The pattern can be generated as a background (so that it does not impinge 
upon an object to be detected) or the pattern can be projected over all of the camera's 
viewable scene so that it illuminates background, foreground and objects to be detected and 
motion captured. 

112] One way to achieve the patterned illumination includes using an infrared LED cluster 
or other non-visible light source in a slide projector. Another approach could use an infrared 
laser beam that is deflected, shuttered, scanned, etc., to produce a pattern. 
[13] Another way to achieve the patterned illumination is to use a regular "floodlight 5 but 
mark the aforementioned pattern onto the camera's viewable area using ink, dye, or paint that 
is either dark or highly reflective in the camera's receptive frequency. This ink, dye, or paint 
can be made invisible to the human eye so as to improve the aesthetics of the display. 
[14] Another aspect of the invention uses a gradient approach to determine object - image 
interaction. An "influence image" is created by creating a gradient aura, or gray scale 
transition, around a detected object. As the detected object moves, the gradient aura is 
calculated in real time. As the gradient aura impinges upon a video image or item, the 
brightness and gradient in the region of the impinged item is calculated. The strength and 
direction of interaction (e.g., a pushing of the item) is a function of the brightness and 
gradient, respectively, of the impinged region. 

[15] In one embodiment the invention provides a system for detecting an object and 
generating a display in response, die system comprising a first source, for outputting 
electromagnetic energy in a first wavelength range, a detector for detecting a reflection of the 
first source of electromagnetic energy from an object, a processor coupled to the detector for 
using the detected reflection to generate a display signal, a second source, for outputing 
electromagnetic energy at a second wavelength range, wherein the second source generates a 
visible display in response to the display signal, wherein the first and second wavelength 
ranges are different. 
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[16] In another embodiment the invention provides a method for detecting an object in an 
image captured with a camera, the method comprising using a patterned illumination to 
illuminate a background differently from the object; and using a processing system to define 
the object apart from the background. 

[17] In another embodiment the invention provides a method for computing an interaction 
of an object with a video item, the method comprising using a processor to determine a 
gradient for the object; using a processor to determine a boundary for the video item; and 
identifying an interaction by using the gradient and the boundary. 

BRIEF DESCRIPTION OF THE DRAWINGS 
[18] Fig. 1 shows a first configuration of a preferred embodiment using a co-located 
projector and camera; 

[19] Fig. 2 shows an overhead projection configuration; 

[20] Fig. 3 shows a rear-projection configuration; 

[21] Fig. 4 shows a side-projection configuration; 

[22] Fig. 5 A illustrates a subject under uniform illumination; 

[23] Fig. 5B illustrates a background under random dot pattern illumination; 

[24] Fig. 5C illustrates a subject and background under random dot pattern illumination; 

[25] Fig. 5D shows a result of detecting a subject from a background using random dot 

pattern illumination; 

[26] Fig. 6 A shows a human user interacting with a video object; and 
[27] Fig. 6B illustrates an influence image. 

DETAILED DESCRIPTION OF THE INVENTION 
[28] Several configurations of the invention are described below. In general, the present 
invention uses a first light source to illuminate a user, or another object. The first light source 
uses light that is not visible to humans. For example, infrared or ultraviolet light can be used. 
A camera that is sensitive to light at the wavelength range of the first light source is used to 
detect a user who is illuminated by the first light source. A computer (or other processing 
system).is used to process the detected object image and to generate images for display. A 
second light source (e.g., a projector, video screen, etc.) is used to display the generated 
display images to a human user or viewers. The displayed images are at wavelengths that 
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minimize interference with the camera's object detection. Typically, the visible spectrum is 
used to display the images. 

[29] In a preferred embodiment, the display surrounds the user such that the user's virtual 
presence is aligned with the user's physical presence. Thus, the virtual scene on the display 
has a physical location around the user, and the user's movement within the display will 
cause identical movement of the user's representation within the virtual scene. For example, 
the user can impinge on a virtual object's physical location and know that this will cause their 
virtual representation to touch the virtual object in the computer system. The use of the term 
'touch" or "touching" in this specification does not mean physical contact with an object, 
such as a person, and an image item. Rather the notion of touching means that the object's 
position and action in physical space is translated into an effect in a generated image, 
including effects of moving items in the generated images. 

[30] Displayed images or items can include objects, patterns, shapes or any visual pattern, 
effect, etc. Aspects of the invention can be used for applications such as interactive lighting 
effects for people at clubs or events, interactive advertising displays, characters and virtual 
objects that react to the movements of passers-by, interactive ambient lighting for public 
spaces such as restaurants, shopping malls, sports venues, retail stores, lobbies and parks, 
video game systems, and interactive informational displays. Other applications are possible 
and are within the scope of the invention. 

[311 Fig- 1 shows a front-projection embodiment of the invention using a co-located 
camera and projector. In Fig. 1, a person 1 is illuminated by an infrared (or other non-visible 
light) lamp 2. The image of the person is photographed by an infrared (or other non-visible 
light) camera 3. This signal is transmitted real-time 4 to computer 5. The computer performs 
the object detection algorithm, and generates the video effect in real time. The effect is 
transmitted 6 real-time to video projector 7. The projector projects the resulting image onto a 
screen 8 or some other surface. The video effect is then displayed 9 on the screen, in real 
time, and aligned with the person. 

[32] Fig. 2 shows an overhead projection configuration of the system. Component 10 
includes the aforementioned system. Component 10 is shown mounted vertically here, but 
the camera, projector, and light source within 10 can also be mounted horizontally and then 
redirected downwards with a mirror. A person moving on the ground 1 1 can have the video 
signal projected onto the ground around them 12. The person's own shadow obscures a 
minimal amount of the image when the projector is directly overhead. 
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[33] Figs. 3 and 4 show two more alternate configurations for the camera and projector; In 
both figures, camera 20 captures objects such as a person 22 in front of a screen 23. The 
angle viewed by the camera is shown at 2L In Fig. 3 projector 25 is placed behind the 
screen. The cast light from projector 24 can be seen on the screen from both sides. In Fig. 4, 
projector 25 is at an oblique angle to the screen; its light cone 24 is shown. Both of these 
configurations make it more likely that there are no shadows obstructing the projected image. 
[34] As described in the configurations, above, a video camera is used to capture the scene 
at a particular location for input into the computer. In most configurations of the device, the 
camera views part of the output video display. To prevent unwanted video feedback, the 
camera can operate at a wavelength that is not used by the video display. In most cases, the 
display will use the visible light spectrum. In this case, the camera must photograph in a non- 
visible wavelength, such as infrared, so that the video display output is not detected. 
[351 The scene being videotaped must be illuminated in light of the camera's wavelength. 
In the case of infrared, sources including sunlight, a heat lamp or infrared LEDs can be used 
to illuminate the scene. These lights can be positioned anywhere; however, the camera's 
view of spurious shadows from these lights can be minimized by placing the light source in 
proximity to the camera. A light source, such as one or more lights, can illuminate objects 
with a uniform lighting, as opposed to the patterned illumination discussed, below. In a 
preferred embodiment, the video signal is exported in real-time to the computer. However, 
other embodiments need not achieve real-time or near-real-time and can process object or 
video images (i.e., display images) at times considerably prior to displaying the images. 
[361 This component is designed to be modular; any computer software that utilizes the 
video input from the previous component and outputs the results to a video display can be 
used here. 

[37] Most instances of this component will have two parts: the first part handles the 
detection of mobile objects from static background, while the second part utilizes the object 
information to generate a video output. Numerous instances of each part will be described 
here; these instances are simply meant to be examples, and are by no means exhaustive. 
[38] In the first part, the live image from the video camera is processed real-time in order 
to separate mobile objects (e.g. people) from the static background, regardless of what the 
background is. The processing can be done as follows: 

[39] First, input frames from the video camera are converted to grayscale to reduce the 
amount of data and to simplify the detection process. Next, they may be blurred slightly to 
reduce noise. 
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[40] Any object that does not move over a long period of time is presumed to be 
background; therefore, the system is able to eventually adapt to changing lighting or 
background conditions. A model image of the background can be generated by numerous 
methods, each of which examines the input frames over a range of time. In one method, the 
last several input frames (or a subset thereof) are examined to generate a model of the 
background, either through averaging, generating the median, detecting periods of constant 
brightness, or other heuristics. The length of time over which the input frames are examined 
determines the rate at which the model of the background adapts to changes in the input 
image. 

[41] In another method, the background model is generated at each time step (or more 
infrequently) by computing a weighted average of the current frame and the background 
model from the previous time step. The weight of the current frame is relatively small in this 
calculation; thus, changes in the real background are gradually assimilated into the 
background model. This weight can be tuned to change the rate at which the background 
model adapts to changes in the input image. 

[42] An object of interest is presumed to differ in brightness from the background. In 
order to find objects at each time step, the current video input is subtracted from the model 
image of the background. If the absolute value of this difference at a particular location is 
larger than a particular threshold, then that location is classified as an object; otherwise, it is 
classified as background. 

[43] The second part can be any program that takes the object/background classification of 
an image (possibly in addition to other data) as input, and outputs a video image based on this 
input, possibly in real time. This program can take on an infinite number of forms, and is 
thus as broadly defined as a computer application. For example, this component could be as 
simple as producing a spotlight in the shape of the detected objects, or as complicated as a 
paint program controlled through gestures made by people who are detected as objects. In 
addition, applications could use other forms of input, such as sound, temperature, keyboard 
input etc. as well as additional forms of output, such as audio, tactile, virtual reality, aromatic, 
etc. . 

[44] One major class of applications includes special effects that use the 
object/background classification as input. For example, stars, lines, or other shapes could be 
drawn in the output video image in a random portion of the locations that were classified as 
"object". These shapes could then be set to gradually fade away over time, so that people 
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leave transient trails of shapes behind them as they move around. The following are 

examples of other effects in the same class: 

[45] - contours and ripples surrounding the objects 

[46] - a grid which is deformed by the presence of objects 

[47] - simulations of flame and wind, and other matrix convolutions applied to 

objects 

[48] - special effects that pulse to the beat of the music, which is detected 

separately 

[49] Another major class of applications allows the real objects to interact with virtual 
objects and characters. For example, an image showing a group of ducklings could be 
programmed to follow behind any real object (e.g. a person) that walks in front of the display. 
[50] In addition, computer games that can be played by people moving in front of the 
camera form another class of applications. 

[51] However, this list is not exclusive; this component is designed to be programmable, 
and thus can run any application. 

[52] The output of the processing software from the previous component is displayed 
visually. Possible displays include, but are not limited to video projectors, televisions, 
plasma displays, and laser shows. The displayed image can be aligned with the video 
camera's input range so that the video effects align with the locations of the people causing 
them. Since some configurations of the video camera can detect objects in non-visible light, 
the problem of the display interfering with the camera is avoided. 
[53] There are numerous possible configurations for the different components. For 
example, the camera and a video projector can be in the same location and pointed in the 
same direction The camera and projector can then be pointed at a wall as shown in Fig. 1 , 
pointed at the ground, redirected with a mirror as shown in Fig. 2, or pointed at any other 
surface. Alternatively, the projector could be placed behind the screen as shown in Fig. 3 so 
that the display is identical to the one in Fig. 1 , but the person is no longer in the way of the 
projection, so they do not cast a shadow. The shadow could also be avoided by placing the 
projector at an oblique angle to the screen as shown in Fig. 4. The video display could also 
be a large-screen TV, plasma display, or video wall. While the aforementioned 
configurations all have the video display lined up with the video input, this is not necessary; 
the video display could be placed anywhere. The preceding list is not exhaustive; there are 
numerous additional possible configurations. 
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(54] The overall system can be networked, allowing vision information and processing 
software state information to be exchanged between systems. Thus an object detected in the 
vision signal of one system can affect the processing software in another system. In addition, 
a virtual item in the display of one system could move to other systems. If the displays of 
multiple systems are aligned together so that they form a single larger display, then the 
multiple systems can be made to function as if they were a single very large system, with 
objects and interactions moving seamlessly across display boundaries. 
[55] One common problem with the vision system is that, in cases where there is 
uncontrollable ambient illumination (e.g. sunlight) of the camera's viewable area from a 
significantly different angle than the camera, objects cast shadows onto the background. If 
these shadows are strong enough, the vision system may mistake them for objects. These 
shadows can be detected and removed by strobing the camera's light source. By subtracting 
a camera input image with ambient light alone from a camera input image with both the 
ambient light and the camera's light, the system yields an image that captures the scene as if 
only the camera's light were being used, thus eliminating the detectable shadows from the 
ambient light. 

[561 Additional accuracy in detecting objects with the images captured by the camera can 
be obtained by using patterned illumination or patterned markings. 

[57] One shortcoming of using a simple floodlight illumination system for computer vision 
is that if the colors of objects being viewed by the camera are very similar, then the objects 
can be very difficult to detect. If the camera operates in monochrome it is much more likely 
for the object and background to look the same. 

[58] Using a patterned object to cover camera's viewable area can improve object 
detection. If a pattern that contains two or more colors intermingled in close proximity is 
used, it is highly unlikely that other objects will have a similar look since at least one color of 
the pattern will look different from the color of surrounding objects. If a patterned object, 
such as a screen, is used as a background before which are the objects to be detected, then 
objects that pass in front of the patterned screen are more easily detected by the vision 
algorithm. 

[59] For example, in an infrared vision application the patterned object could be a 
background mat that appears white to the human eye, but contains a light & dark checkered 
pattern that is invisible to the human eye but visible to the camera. By using a pattern that is 
not in the visible light spectrum, the patterned mat will not interfere with the aesthetics of the 
system. The display system (e.g., projection video) can project output images onto the mat, 
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as described above. A process executing on a processing system such as a computer system 
can be provided with the background pattern, thus making detection of an object in front of 
the mat easier, although the system could learn the patterned background in the same way 
that the vision algorithm learns any other background. Also, the ability of the system to adapt 
to changes in background light brightness would not be adversely affected. 
[60] A patterned illumination can also be projected from a light source onto the camera's 
viewable area. As long as the camera and invisible light source are in different, offset 
locations, parallax effects will cause the camera's view of the projected pattern to be distorted 
as objects move through the camera's viewing area. This distortion helps make objects that 
have similar colors stand out from each other. If the difference between the two images seen 
by the camera is taken, the result will show the shape of any object that has appeared, 
disappeared, or moved between the two images. If the image of an object in front of the 
background is subtracted from an image of the background alone, the result is an image that 
is zero where there is background and nonzero where there are other objects. This technique 
can be used in combination with other aspects of the invention discussed, herein. 
[61] A patterned light source can be achieved through several means. One method is to 
use an infrared light-emitting diode (LED) cluster or another non-visible light source in a 
slide projector. A set of lenses would be used to focus the light source through a slide 
containing the desired pattern, thus casting the pattern's image onto the camera's viewing 
area. In another method, an infrared laser beam could be shined onto a laser pattern generator 
or other scattering device in order to produce a light pattern on the camera's viewing area. 
Light can be deflected, shuttered, scanned, etc., in order to achieve a pattern. Many other 
approaches are possible. 

[62] A patterned light source is also useful for 3-D computer vision. 3-D computer vision 
techniques such as the Marr-Poggio algorithm take as input two images of the same scene 
taken from slightly different angles. The patterns on the images are matched up to determine 
the amount of displacement, and hence the distance from the camera, at each point in the 
image. The performance of this algorithm degrades when dealing with objects of uniform 
color because uniform color makes it difficult to match up the corresponding sections in the 
image pair. Thus, the patterned light source can improve the distance estimates of some 3D 
computer vision algorithms. 

[63] The two input images to these 3-D vision algorithms are usually generated using a 
pair of cameras pointed at the scene. However, it would also be possible to use only one 
camera. The second image could be an entirely undistorted version of the projected pattern, 
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which is known ahead of time. This image of the pattern is essentially identical to what a 
second camera would see if it were placed at the exact same location as the patterned light 
source. Thus, the single camera's view and the projected pattern together could be used as an 
input to the 3-D vision algorithm. Alternatively, the second image could be an image of the 
background alone, taken from the same camera. 

[64] While many different kinds of patterns can be used, a high-resolution random dot 
pattern has certain advantages for both 2-D and 3-D vision. Due to the randomness of the dot 
pattern, each significantly sized section of the dot pattern is highly unlikely to look like any 
other section of the pattern. Thus, the displaced pattern caused by the presence of an object 
in the viewing area is highly unlikely to look similar to the pattern without the object there. 
This maximizes the ability of the vision algorithm to detect displacements in the pattern, and 
therefore objects. Using a regular pattern such as a grid can cause some difficulty because 
different sections of the pattern are identical, causing the displaced pattern to often look like 
the non-displaced pattern. 

[65] Figs. 5A-D show the usefulness of a random dot pattern in detecting an object. Fig. 
5 A shows a picture of a person under normal illumination. The person has a similar 
brightness to the background, making detection difficult In Fig. 5B, a random dot pattern is 
projected onto the background from a light source near the camera. When the person stands 
in front of this pattern, the pattern reflected off of the person is displaced, as shown in Fig. 
5C. By taking the difference between the images in Figs. 5B and 5C, the image of Fig. 5D is 
obtained which defines the image area of the person with a strong signal. 
[66] Other approaches can be used to improve object detection. For example, a light 
source can be "strobed" or turned on-and-off periodically so that detection of shadows due to 
other light sources (e.g., ambient light) is made easier. 

[67] Once an object has been detected and defined the preferred embodiment uses a 
gradient aura to determine degree and direction of interaction of the object with a displayed 
image item. 

[68] Fig. 6A shows a human user interacting with a video object. 

[69] In Fig. 6A, object 304 has been detected and is shown in outline form. One 

representation of the object within a computer's processing can use the outline definition 

depicted in Fig. 6A. Video screen 302 displays several image items, such as image 306 of a 

bail. 

[70] Fig. 6B illustrates an influence image for the region of 308 of Fig. 6A. 
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[71] In Fig. 6B, the outline image of the user's foot 320 and lower leg are used to generate 
successively larger outline areas. The original outline area 320's region is assigned a large 
pixel brightness value corresponding to white. Each successive outline area, 322, 324, 326, 
328, 330 is assigned a progressively lower value so that a point farther away from the initial 
outline (white) area will have a lower pixel value. Note that any number of outline areas can 
be used. Also, the size and increments of outline areas can vary, as desired. For example, it 
is possible to use a continuous gradient, rather than discrete areas. The collection of all 
outline areas is referred to as the ''influence image." 

[72] The influence image is compared to different image items. In Fig. 6B, ball item 306 
impinges upon gradient areas 326, 328 and 330. As is known in the art, direction lines are 
determined in the direction of the gradient of the pixel value field for the impinged areas. 
Fig. 6B shows three example direction lines within item 306. The direction lines can be 
combined, e.g., by averaging, or a select single line can be used. The processing also detects 
that the brightest outline area impinged by the item is outline area 326. Other approaches are 
possible. For example, the brightness and gradient can be averaged over every point in the 
area of the image item, or on a subset of those points. Also, some embodiments can include 
duration of contact as a factor in addition to the brightness and gradient. 
[73] The interaction between an object, such as a person, and an item on the screen is 
computed using both the brightness of impinged outline areas and the direction as computed 
using one or more direction lines. The impinged brightness corresponds to the strength with 
which the user is touching" the item. The gradient corresponds to the direction in (or from, 
depending on the sign of the calculation) which the item is being touched. 

[74] Although the invention has been discussed with reference to specific embodiments 
thereof, these embodiments are illustrative, not restrictive, of the invention. For example, 
although the preferred embodiments use a camera as a detector, different types of detection 
devices can be employed. The camera can be digital or analog. A stereo camera could be 
used in order to provide depth information as well as position. In cases where processing and 
display are not done in real time, film and other types of media can be used and followed up 
by a digital conversion before inputting the data to a processor. Light sensors or detectors 
can be used. For example, an array of photodetectors can be used in place of a camera. 
Other detectors not contemplated herein can be used with suitable results. 
[75] In general, any type of display device can be used with the present invention. For 
example, although video devices have been described in the various embodiments and 
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configurations, other types of visual presentation devices can be used A light-emitting diode 
(LED) array, organic LED (OLED), light-emitting polymer (LEP), electromagnetic, cathode 
ray, plasma, mechanical or other display system can be employed. 
[76] Virtual reality, three-dimensional or other types of displays can be employed. For 
example, a user can wear imaging goggles or a hood so that they are immersed within a 
generated surrounding. In this approach, the generated display can align with the user's 
perception of their surroundings to create an augmented, or enhanced, reality. One 
embodiment may allow a user to interact with an image of a character. The character can be 
computer generated, played by a human actor, etc. The character can react to the user's 
actions and body position. Interactions can include speech, co-manipulation of objects, etc. 
[77] Multiple systems can be interconnected via, e.g., a digital network. For example, 
Ethernet, Universal Serial Bus (USB), IEEE 1394 (Firewire), etc., can be used. Wireless 
communication links such as defined by 802.1 lb, etc., can be employed. By using multiple 
systems, users in different geographic locations can cooperate, compete, or otherwise interact 
with each other through generated images. Images generated by two or more systems can be 
<4 tiled" together, or otherwise combined to produce conglomerate displays. 
[78] Other types of illumination, as opposed to light, can be used. For example, radar 
signals, microwave or other electromagnetic waves can be used to advantage in situations 
where an object to detect (e.g., a metal object) is highly reflective of such waves. It is 
possible to adapt aspects of the system to other forms of detection such as by using acoustic 
waves in air or water. 

[791 Although computer systems have been described to receive and process the object 
image signals and to generate display signals, any other type of processing system can be 
used. For example, a processing system that does not use a general-purpose computer can be 
employed. Processing systems using designs based upon custom or semi-custom circuitry or 
chips, application specific integrated circuits (ASICs), field-programmable gate arrays 
(FPGAs), multiprocessor, asynchronous or any type of architecture design or methodology 
can be suitable for use with the present invention. 

[80] Thus, the scope of the invention is to be determined solely by the appended claims. 
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WHAT IS CLAIMED IS: 

1 . A system for detecting an object and generating a display in response, 
the system comprising 

a first source, for outputing electromagnetic energy in a first wavelength 

range; 

a detector for detecting a reflection of the first source of electromagnetic 
energy from an object; 

a processor coupled to the detector for using the detected reflection to generate 
a display signal; 

a second source, for outputing electromagnetic energy at a second wavelength 
range, wherein the second source generates a visible display in response to the display signal, 
wherein the first and second wavelength ranges are different 

2. The system of claim 1, wherein the first source outputs light that is not 
in the visible spectrum and wherein the second source outputs light that is in the visible 
spectrum. 

3. The system of claim 2, wherein the first source outputs infrared light 

4. The system of claim 1, wherein the second source includes a video 

projector. 

5. The system of claim 4, wherein the video projector projects images 
from above the object. 

6. The system of claim 4, wherein the video projector projects images on 
a surface adjacent to the object 

7. The system of claim 6, wherein the surface is part of a rear-projection 

system. 

8. The system of claim 6, wherein the surface is part of a front-projection 

system. 

9. The system of claim 1 , wherein the object includes a human user. 
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10. The system of claim 1 , wherein the first source includes a pattern of 

illumination. 

1 1 . The system of claim 10, wherein the pattern includes a random dot 

pattern. 

12. The system of claim 1 0, further comprising 

an infrared light-emitting diode cluster for generating the pattern of 

illumination. 

1 3 . The system of claim 1 , further comprising 

a process for determining an influence image, wherein the influence image 
includes a region around an object image derived from the object 

14. The system of claim 13, wherein the visible display includes an item , 
the system further comprising 

a process for determining a gradient of the influence image; and 

a process for using the gradient to determine interaction between the object 

and the item. 

15. The system of claim 1, wherein the visible display includes a 
rendering of a character. 

16. A method for obtaining an image of a human user wherein the human 
user is adjacent to a displayed image, the method comprising 

using light at a first wavelength to illuminate the human user; 
using a camera responsive to the light at a first wavelength to detect the image 
of the human user; and 

using light at a second wavelength, different from the first wavelength, to 
generate the displayed image. 

. 1 7. The method of claim 1 6, wherein multiple cameras are used. 

18. The method of claim 17, wherein at least two cameras are used to 
produce a stereo effect 

19. The method of claim 16, wherein a plasma screen is used to generate 
the displayed image. 

. 20. The method of claim 1 6, wherein the displayed image includes 

advertising. 
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21 . The method of claim 16, wherein the displayed image is part of a video 

game. 

22. The method of claim 16, further comprising 
strobing the light at a first wavelength. 

23. The method of claim 16, further comprising 

interconnecting multiple systems so that information about image detection 
and displays can be transferred between the systems. 

24. The method of claim 23, further comprising 

using the transferred information to create a single display from two or more 
displays. 

25. A method for detecting an object in an image captured with a camera, 
the method comprising 

using a patterned illumination to illuminate a background and not the object; 

and 

using a processing system to define the object apart from the background. 

26. The system of claim 25, wherein the patterned illumination includes a 
random dot pattern. 

27. The system of claim 25, wherein the patterned illumination includes a 
checkerboard pattern. 

28. The system of claim 25, wherein the patterned illumination also 
illuminates the object. 

29. A method for detecting an object in an image captured with a camera, 
the method comprising 

using a patterned background; and 

using a processing system to define the obj ect apart from the background. 

30. A method for computing an interaction of an object with a video item, 
the method comprising 

using a processor to determine a gradient for the object; 

using a processor to determine a boundary for the video item; and 

identifying an interaction by using the gradient and the boundary. 

3 1 . The system of claim 30, further comprising 

using a processor to determine a brightness of an area derived from the object; 

and 

identifying an interaction by using the brightness of the area and the boundary. 
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32. The system of claim 30, wherein the interaction is a person pushing the 

item. 

33. The system of claim 30, wherein the interaction is a person touching 

the item. 

34. The system of claim 30, wherein the interaction is a person deforming 

the item. 

35. The system of claim 30, wherein the interaction is a person 
manipulating the item 

36. A system for detecting an object and generating an output in response, 
the system comprising 

a first source, for outputing electromagnetic energy in a first wavelength 

range; 

a detector for detecting a reflection of the first source of electromagnetic 
energy from an object; 

a processor coupled to the detector for using the detected reflection to generate 
a display signal; 

a second source, for outputing electromagnetic energy at a second wavelength 
range, wherein the second source generates a visible display in response to the display signal, 
wherein the first and second wavelength ranges are different; and 

audio output for outputing audio. 
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